A Comparison of Two Corpus - Based Methods forTranslingual Information

نویسنده

  • Michael L. Littman
چکیده

In translingual information retrieval (TIR), ad hoc queries in any of a set of languages can be used to retrieve documents in any of a set of languages. Classical information-retrieval methods such as the vector-space model cannot be applied to TIR because they base similarity on the overlap of terms between queries and documents| this is typically zero in TIR. The generalized vector-space model (GVSM) and latent semantic indexing (LSI) are two variations of the vector-space model that make comparisons outside of term space. For this reason, both can be and have been applied to TIR. In this paper, we report on a series of experiments comparing the performance of GVSM and LSI on monolingual and translingual retrieval tasks. We nd that the performance of both methods depends crucially on parameter settings, that LSI performs better, and that GVSM runs more quickly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Two Corpus - Based Methods forTranslingual

In translingual information retrieval (TIR), ad hoc queries in any of a set of languages can be used to retrieve documents in any of a set of languages. Classical informationretrieval methods such as the vector-space model cannot be applied to TIR because they base similarity on the overlap of terms between queries and documents|this is typically zero in TIR. The generalized vector-space model ...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

The Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability

Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...

متن کامل

The Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability

Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000